Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 106
Filter
1.
Article in English | MEDLINE | ID: mdl-38742455

ABSTRACT

BACKGROUND: Error analysis plays a crucial role in clinical concept extraction, a fundamental subtask within clinical natural language processing (NLP). The process typically involves a manual review of error types, such as contextual and linguistic factors contributing to their occurrence, and the identification of underlying causes to refine the NLP model and improve its performance. Conducting error analysis can be complex, requiring a combination of NLP expertise and domain-specific knowledge. Due to the high heterogeneity of electronic health record (EHR) settings across different institutions, challenges may arise when attempting to standardize and reproduce the error analysis process. OBJECTIVES: This study aims to facilitate a collaborative effort to establish common definitions and taxonomies for capturing diverse error types, fostering community consensus on error analysis for clinical concept extraction tasks. MATERIALS AND METHODS: We iteratively developed and evaluated an error taxonomy based on existing literature, standards, real-world data, multisite case evaluations, and community feedback. The finalized taxonomy was released in both .dtd and .owl formats at the Open Health Natural Language Processing Consortium. The taxonomy is compatible with several different open-source annotation tools, including MAE, Brat, and MedTator. RESULTS: The resulting error taxonomy comprises 43 distinct error classes, organized into 6 error dimensions and 4 properties, including model type (symbolic and statistical machine learning), evaluation subject (model and human), evaluation level (patient, document, sentence, and concept), and annotation examples. Internal and external evaluations revealed strong variations in error types across methodological approaches, tasks, and EHR settings. Key points emerged from community feedback, including the need to enhancing clarity, generalizability, and usability of the taxonomy, along with dissemination strategies. CONCLUSION: The proposed taxonomy can facilitate the acceleration and standardization of the error analysis process in multi-site settings, thus improving the provenance, interpretability, and portability of NLP models. Future researchers could explore the potential direction of developing automated or semi-automated methods to assist in the classification and standardization of error analysis.

2.
J Biomed Inform ; 152: 104623, 2024 04.
Article in English | MEDLINE | ID: mdl-38458578

ABSTRACT

INTRODUCTION: Patients' functional status assesses their independence in performing activities of daily living, including basic ADLs (bADL), and more complex instrumental activities (iADL). Existing studies have discovered that patients' functional status is a strong predictor of health outcomes, particularly in older adults. Depite their usefulness, much of the functional status information is stored in electronic health records (EHRs) in either semi-structured or free text formats. This indicates the pressing need to leverage computational approaches such as natural language processing (NLP) to accelerate the curation of functional status information. In this study, we introduced FedFSA, a hybrid and federated NLP framework designed to extract functional status information from EHRs across multiple healthcare institutions. METHODS: FedFSA consists of four major components: 1) individual sites (clients) with their private local data, 2) a rule-based information extraction (IE) framework for ADL extraction, 3) a BERT model for functional status impairment classification, and 4) a concept normalizer. The framework was implemented using the OHNLP Backbone for rule-based IE and open-source Flower and PyTorch library for federated BERT components. For gold standard data generation, we carried out corpus annotation to identify functional status-related expressions based on ICF definitions. Four healthcare institutions were included in the study. To assess FedFSA, we evaluated the performance of category- and institution-specific ADL extraction across different experimental designs. RESULTS: ADL extraction performance ranges from an F1-score of 0.907 to 0.986 for bADL and 0.825 to 0.951 for iADL across the four healthcare sites. The performance for ADL extraction with impairment ranges from an F1-score of 0.722 to 0.954 for bADL and 0.674 to 0.813 for iADL across four healthcare sites. For category-specific ADL extraction, laundry and transferring yielded relatively high performance, while dressing, medication, bathing, and continence achieved moderate-high performance. Conversely, food preparation and toileting showed low performance. CONCLUSION: NLP performance varied across ADL categories and healthcare sites. Federated learning using a FedFSA framework performed higher than non-federated learning for impaired ADL extraction at all healthcare sites. Our study demonstrated the potential of the federated learning framework in functional status extraction and impairment classification in EHRs, exemplifying the importance of a large-scale, multi-institutional collaborative development effort.


Subject(s)
Activities of Daily Living , Functional Status , Humans , Aged , Learning , Information Storage and Retrieval , Natural Language Processing
3.
Stud Health Technol Inform ; 310: 850-854, 2024 Jan 25.
Article in English | MEDLINE | ID: mdl-38269929

ABSTRACT

With increasing number of people living with dementia, the problem of late diagnosis significantly impacts a person's quality of life while early signs of dementia may provide useful insights to facilitate better treatment plans. With time, this progressive neurodegenerative syndrome could progress from mild cognitive impairment to dementia. A pattern of health conditions can be characterized in unsupervised manner to help predict this progress. As a significant extension to our previous work with streaming clustering model, we consider additional information for predicting dementia onset. With empirical observations, we discover the importance of examining sex and age to predict dementia onset. To this end, we propose a sex-specific model with age-constraint for predicting dementia onset and validate the effectiveness of our models using data from Mayo Clinic Study of Aging (MCSA). The proposed sex-specific models for older adult populations (>=65 years of age) outperformed the previous models with F-score of 77% and 78% for male-specific and female-specific models, respectively. Our experiments of sex-specific temporal clustering of features in older adults demonstrate the potential of more personalized models for early alerts of dementia.


Subject(s)
Cognitive Dysfunction , Dementia , Humans , Female , Male , Aged , Quality of Life , Aging , Cluster Analysis , Cognitive Dysfunction/diagnosis , Dementia/diagnosis
4.
J Biomed Inform ; 150: 104586, 2024 02.
Article in English | MEDLINE | ID: mdl-38191011

ABSTRACT

BACKGROUND: Halbert L. Dunn's concept of wellness is a multi-dimensional aspect encompassing social and mental well-being. Neglecting these dimensions over time can have a negative impact on an individual's mental health. The manual efforts employed in in-person therapy sessions reveal that underlying factors of mental disturbance if triggered, may lead to severe mental health disorders. OBJECTIVE: In our research, we introduce a fine-grained approach focused on identifying indicators of wellness dimensions and mark their presence in self-narrated human-writings on Reddit social media platform. DESIGN AND METHOD: We present the MultiWD dataset, a curated collection comprising 3281 instances, as a specifically designed and annotated dataset that facilitates the identification of multiple wellness dimensions in Reddit posts. In our study, we introduce the task of identifying wellness dimensions and utilize state-of-the-art classifiers to solve this multi-label classification task. RESULTS: Our findings highlights the best and comparative performance of fine-tuned large language models with fine-tuned BERT model. As such, we set BERT as a baseline model to tag wellness dimensions in a user-penned text with F1 score of 76.69. CONCLUSION: Our findings underscore the need of trustworthy and domain-specific knowledge infusion to develop more comprehensive and contextually-aware AI models for tagging and extracting wellness dimensions.


Subject(s)
Mental Disorders , Social Media , Humans , Mental Health , Awareness
6.
J Clin Transl Sci ; 7(1): e187, 2023.
Article in English | MEDLINE | ID: mdl-37745932

ABSTRACT

Introduction: We tested the ability of our natural language processing (NLP) algorithm to identify delirium episodes in a large-scale study using real-world clinical notes. Methods: We used the Rochester Epidemiology Project to identify persons ≥ 65 years who were hospitalized between 2011 and 2017. We identified all persons with an International Classification of Diseases code for delirium within ±14 days of a hospitalization. We independently applied our NLP algorithm to all clinical notes for this same population. We calculated rates using number of delirium episodes as the numerator and number of hospitalizations as the denominator. Rates were estimated overall, by demographic characteristics, and by year of episode, and differences were tested using Poisson regression. Results: In total, 14,255 persons had 37,554 hospitalizations between 2011 and 2017. The code-based delirium rate was 3.02 per 100 hospitalizations (95% CI: 2.85, 3.20). The NLP-based rate was 7.36 per 100 (95% CI: 7.09, 7.64). Rates increased with age (both p < 0.0001). Code-based rates were higher in men compared to women (p = 0.03), but NLP-based rates were similar by sex (p = 0.89). Code-based rates were similar by race and ethnicity, but NLP-based rates were higher in the White population compared to the Black and Asian populations (p = 0.001). Both types of rates increased significantly over time (both p values < 0.001). Conclusions: The NLP algorithm identified more delirium episodes compared to the ICD code method. However, NLP may still underestimate delirium cases because of limitations in real-world clinical notes, including incomplete documentation, practice changes over time, and missing clinical notes in some time periods.

7.
J Arthroplasty ; 38(10): 1948-1953, 2023 10.
Article in English | MEDLINE | ID: mdl-37619802

ABSTRACT

Total joint arthroplasty is becoming one of the most common surgeries within the United States, creating an abundance of analyzable data to improve patient experience and outcomes. Unfortunately, a large majority of this data is concealed in electronic health records only accessible by manual extraction, which takes extensive time and resources. Natural language processing (NLP), a field within artificial intelligence, may offer a viable alternative to manual extraction. Using NLP, a researcher can analyze written and spoken data and extract data in an organized manner suitable for future research and clinical use. This article will first discuss common subtasks involved in an NLP pipeline, including data preparation, modeling, analysis, and external validation, followed by examples of NLP projects. Challenges and limitations of NLP will be discussed, closing with future directions of NLP projects, including large language models.


Subject(s)
Artificial Intelligence , Natural Language Processing , Humans , Arthroplasty , Language , Electronic Health Records
8.
J Alzheimers Dis ; 95(3): 931-940, 2023.
Article in English | MEDLINE | ID: mdl-37638438

ABSTRACT

BACKGROUND: Multiple algorithms with variable performance have been developed to identify dementia using combinations of billing codes and medication data that are widely available from electronic health records (EHR). If the characteristics of misclassified patients are clearly identified, modifying existing algorithms to improve performance may be possible. OBJECTIVE: To examine the performance of a code-based algorithm to identify dementia cases in the population-based Mayo Clinic Study of Aging (MCSA) where dementia diagnosis (i.e., reference standard) is actively assessed through routine follow-up and describe the characteristics of persons incorrectly categorized. METHODS: There were 5,316 participants (age at baseline (mean (SD)): 73.3 (9.68) years; 50.7% male) without dementia at baseline and available EHR data. ICD-9/10 codes and prescription medications for dementia were extracted between baseline and one year after an MCSA dementia diagnosis or last follow-up. Fisher's exact or Kruskal-Wallis tests were used to compare characteristics between groups. RESULTS: Algorithm sensitivity and specificity were 0.70 (95% CI: 0.67, 0.74) and 0.95 (95% CI: 0.95, 0.96). False positives (i.e., participants falsely diagnosed with dementia by the algorithm) were older, with higher Charlson comorbidity index, more likely to have mild cognitive impairment (MCI), and longer follow-up (versus true negatives). False negatives (versus true positives) were older, more likely to have MCI, or have more functional limitations. CONCLUSIONS: We observed a moderate-high performance of the code-based diagnosis method against the population-based MCSA reference standard dementia diagnosis. Older participants and those with MCI at baseline were more likely to be misclassified.


Subject(s)
Alzheimer Disease , Cognitive Aging , Cognitive Dysfunction , Dementia , Humans , Male , Female , Dementia/diagnosis , Dementia/epidemiology , Alzheimer Disease/diagnosis , Disease Progression , Cognitive Dysfunction/diagnosis , Cognitive Dysfunction/epidemiology
9.
BMJ Open ; 13(4): e069375, 2023 04 21.
Article in English | MEDLINE | ID: mdl-37085302

ABSTRACT

OBJECTIVE: Ceramides have been associated with several ageing-related conditions but have not been studied as a general biomarker of multimorbidity (MM). Therefore, we determined whether ceramide levels are associated with the rapid development of MM. DESIGN: Retrospective cohort study. SETTING: Mayo Clinic Biobank. PARTICIPANTS: 1809 persons in the Mayo Clinic Biobank ≥65 years without MM at the time of enrolment, and with ceramide levels assayed from stored plasma. PRIMARY OUTCOME MEASURE: Persons were followed for a median of 5.7 years through their medical records to identify new diagnoses of 20 chronic conditions. The number of new conditions was divided by the person-years of follow-up to calculate the rate of accumulation of new chronic conditions. RESULTS: Higher levels of C18:0 and C20:0 were associated with a more rapid rate of accumulation of chronic conditions (C18:0 z score RR: 1.30, 95% CI: 1.10 to 1.53; C20:0 z score RR: 1.26, 95% CI: 1.07 to 1.49). Higher C18:0 and C20:0 levels were also associated with an increased risk of hypertension and coronary artery disease. CONCLUSIONS: C18:0 and C20:0 were associated with an increased risk of cardiometabolic conditions. When combined with biomarkers specific to other diseases of ageing, these ceramides may be a useful component of a biomarker panel for predicting accelerated ageing.


Subject(s)
Ceramides , Multimorbidity , Humans , Cohort Studies , Risk Factors , Biological Specimen Banks , Retrospective Studies , Biomarkers , Chronic Disease
10.
Arch Bone Jt Surg ; 11(1): 1-11, 2023.
Article in English | MEDLINE | ID: mdl-36793660

ABSTRACT

Background: Knee osteoarthritis (OA) is a prevalent joint disease. Clinical prediction models consider a wide range of risk factors for knee OA. This review aimed to evaluate published prediction models for knee OA and identify opportunities for future model development. Methods: We searched Scopus, PubMed, and Google Scholar using the terms knee osteoarthritis, prediction model, deep learning, and machine learning. All the identified articles were reviewed by one of the researchers and we recorded information on methodological characteristics and findings. We only included articles that were published after 2000 and reported a knee OA incidence or progression prediction model. Results: We identified 26 models of which 16 employed traditional regression-based models and 10 machine learning (ML) models. Four traditional and five ML models relied on data from the Osteoarthritis Initiative. There was significant variation in the number and type of risk factors. The median sample size for traditional and ML models was 780 and 295, respectively. The reported Area Under the Curve (AUC) ranged between 0.6 and 1.0. Regarding external validation, 6 of the 16 traditional models and only 1 of the 10 ML models validated their results in an external data set. Conclusion: Diverse use of knee OA risk factors, small, non-representative cohorts, and use of magnetic resonance imaging which is not a routine evaluation tool of knee OA in daily clinical practice are some of the main limitations of current knee OA prediction models.

11.
Sci Rep ; 13(1): 1971, 2023 02 03.
Article in English | MEDLINE | ID: mdl-36737471

ABSTRACT

The electronic Medical Records and Genomics (eMERGE) Network assessed the feasibility of deploying portable phenotype rule-based algorithms with natural language processing (NLP) components added to improve performance of existing algorithms using electronic health records (EHRs). Based on scientific merit and predicted difficulty, eMERGE selected six existing phenotypes to enhance with NLP. We assessed performance, portability, and ease of use. We summarized lessons learned by: (1) challenges; (2) best practices to address challenges based on existing evidence and/or eMERGE experience; and (3) opportunities for future research. Adding NLP resulted in improved, or the same, precision and/or recall for all but one algorithm. Portability, phenotyping workflow/process, and technology were major themes. With NLP, development and validation took longer. Besides portability of NLP technology and algorithm replicability, factors to ensure success include privacy protection, technical infrastructure setup, intellectual property agreement, and efficient communication. Workflow improvements can improve communication and reduce implementation time. NLP performance varied mainly due to clinical document heterogeneity; therefore, we suggest using semi-structured notes, comprehensive documentation, and customization options. NLP portability is possible with improved phenotype algorithm performance, but careful planning and architecture of the algorithms is essential to support local customizations.


Subject(s)
Electronic Health Records , Natural Language Processing , Genomics , Algorithms , Phenotype
12.
J Arthroplasty ; 38(10): 2081-2084, 2023 10.
Article in English | MEDLINE | ID: mdl-36280160

ABSTRACT

BACKGROUND: Natural language processing (NLP) systems are distinctive in their ability to extract critical information from raw text in electronic health records (EHR). We previously developed three algorithms for total hip arthroplasty (THA) operative notes with rules aimed at capturing (1) operative approach, (2) fixation method, and (3) bearing surface using inputs from a single institution. The purpose of this study was to externally validate and improve these algorithms as a prerequisite for broader adoption in automated registry data curation. METHODS: The previous NLP algorithms developed at Mayo Clinic were deployed and refined on EHRs from OrthoCarolina, evaluating 39 randomly selected primary THA operative reports from 2018 to 2021. Operative reports were available only in PDF format, requiring conversion to "readable" text with Adobe software. Accuracy statistics were calculated against manual chart review. RESULTS: The operative approach, fixation technique, and bearing surface algorithms all demonstrated perfect accuracy of 100%. By comparison, validated performance at the developing center yielded an accuracy of 99.2% for operative approach, 90.7% for fixation technique, and 95.8% for bearing surface. CONCLUSION: NLP algorithms applied to data from an external center demonstrated excellent accuracy in delineating common elements in THA operative notes. Notably, the algorithms had no functional problems evaluating scanned PDFs that were converted to "readable" text by common software. Taken together, these findings provide promise for NLP applied to scanned PDFs as a source to develop large registries by reliably extracting data of interest from very large unstructured data sets in an expeditious and cost-effective manner.


Subject(s)
Arthroplasty, Replacement, Hip , Humans , Natural Language Processing , Common Data Elements , Algorithms , Software , Electronic Health Records
13.
Am J Med Qual ; 38(1): 17-22, 2023.
Article in English | MEDLINE | ID: mdl-36283056

ABSTRACT

Delirium is known to be underdiagnosed and underdocumented. Delirium detection in retrospective studies occurs mostly by clinician diagnosis or nursing documentation. This study aims to assess the effectiveness of natural language processing-confusion assessment method (NLP-CAM) algorithm when compared to conventional modalities of delirium detection. A multicenter retrospective study analyzed 4351 COVID-19 hospitalized patient records to identify delirium occurrence utilizing three different delirium detection modalities namely clinician diagnosis, nursing documentation, and the NLP-CAM algorithm. Delirium detection by any of the 3 methods is considered positive for delirium occurrence as a comparison. NLP-CAM captured 80% of overall delirium, followed by clinician diagnosis at 55%, and nursing flowsheet documentation at 43%. Increase in age, Charlson comorbidity score, and length of hospitalization had increased delirium detection odds regardless of the detection method. Artificial intelligence-based NLP-CAM algorithm, compared to conventional methods, improved delirium detection from electronic health records and holds promise in delirium diagnostics.


Subject(s)
COVID-19 , Delirium , Humans , Delirium/diagnosis , Delirium/epidemiology , Retrospective Studies , Artificial Intelligence , Natural Language Processing , COVID-19/diagnosis , Algorithms
14.
Conf Proc IEEE Int Conf Syst Man Cybern ; 2023: 3854-3859, 2023 Oct.
Article in English | MEDLINE | ID: mdl-38524640

ABSTRACT

Low self-esteem and interpersonal needs (i.e., thwarted belongingness (TB) and perceived burden-someness (PB)) have a major impact on depression and suicide attempts. Individuals seek social connectedness on social media to boost and alleviate their loneliness. Social media platforms allow people to express their thoughts, experiences, beliefs, and emotions. Prior studies on mental health from social media have focused on symptoms, causes, and disorders. Whereas an initial screening of social media content for interpersonal risk factors and low self-esteem may raise early alerts and assign therapists to at-risk users of mental disturbance. Standardized scales measure self-esteem and interpersonal needs from questions created using psychological theories. In the current research, we introduce a psychology-grounded and expertly annotated dataset, LoST: Low Self esTeem, to study and detect low self-esteem on Reddit. Through an annotation approach involving checks on coherence, correctness, consistency, and reliability, we ensure gold standard for supervised learning. We present results from different deep language models tested using two data augmentation techniques. Our findings suggest developing a class of language models that infuses psychological and clinical knowledge.

15.
Article in English | MEDLINE | ID: mdl-38404695

ABSTRACT

Dementia is among the leading causes of cognitive and functional loss and disability in older adults. Past studies suggested sex differences in health conditions and progression of cognitive decline. Existing studies on the temporal trajectory of health conditions for patient characterization after dementia diagnosis are scarce and ambiguous. Thus, there's limited and unclear research on how health conditions change over time after a dementia diagnosis. To this end, we aim to analyze the shift in medical conditions and examine sex-specific changes in patterns of chronic health conditions after dementia diagnosis. We centered our analysis on a 15-year window around the point of dementia diagnosis, encompassing the 5 years leading up to the diagnosis and the 10 years following it. We introduce (i) MedMet, a network metric to quantify the contribution of each medical condition, and (ii) growth and decay function for temporal trajectory analysis of medical conditions. Our experiments demonstrate that certain health conditions are more prevalent among females than males. Thus, our findings underscore the pressing need to examine differences between men and women, which could be important for healthcare utilization after a dementia diagnosis.

16.
Article in English | MEDLINE | ID: mdl-38404694

ABSTRACT

This paper presents a machine learning-based prediction for dementia, leveraging transfer learning to reuse the knowledge learned from prediction of mild cognitive impairment, a precursor of dementia. We also examine the impacts of temporal aspects of longitudinal data and sex differences. The methodology encompasses key components such as setting the duration window, comparing different modeling strategies, conducting comprehensive evaluations, and examining the sex-specific impacts of simulated scenarios. The findings reveal that cognitive deficits in females, once detected at the mild cognitive impairment stage, tend to deteriorate over time, while males exhibit more diverse decline across various characteristics without highlighting specific ones. However, the underlying reasons for these sex differences remain unknown and warrant further investigation.

17.
IEEE Int Conf Healthc Inform ; 2023: 581-587, 2023 Jun.
Article in English | MEDLINE | ID: mdl-38384500

ABSTRACT

With advancements in analysis of cognitive decline in electronic health records, the research community witnesses a recent surge in social media posting by caregivers and/or loved ones of people with cognitive decline. The major challenges in this area are availability of large and diverse datasets, ethics of data collection and sharing, diagnostic specificity and clinical acceptability. To this end, we construct a new dataset, Caregivers experiences with cognitive Decline (CareD), of 1005 posts with more than 194K words and 9541 sentences, highlighting discussions on people with dementia and Alzheimer's disease on Reddit. We discuss the changing trends of discussions on cognitive decline in social media and open challenges for natural language processing and social computing. We first identify the Reddit posts reflecting substantial information as candidate posts. We further formulate the annotation guidelines, handle perplexities to investigate the existence of experiences, self-reported articles and potential caregiver in candidate posts, resulting in the discovery of latent symptoms, firsthand information, and prospective source of longitudinal information about the patient, respectively.

18.
Proc Conf Assoc Comput Linguist Meet ; 2023: 306-312, 2023 Jul.
Article in English | MEDLINE | ID: mdl-38384674

ABSTRACT

Amid ongoing health crisis, there is a growing necessity to discern possible signs of Wellness Dimensions (WD) manifested in self-narrated text. As the distribution of WD on social media data is intrinsically imbalanced, we experiment the generative NLP models for data augmentation to enable further improvement in the pre-screening task of classifying WD. To this end, we propose a simple yet effective data augmentation approach through prompt-based Generative NLP models, and evaluate the ROUGE scores and syntactic/semantic similarity among existing interpretations and augmented data. Our approach with ChatGPT model surpasses all the other methods and achieves improvement over baselines such as Easy-Data Augmentation and Backtranslation. Introducing data augmentation to generate more training samples and balanced dataset, results in the improved F-score and the Matthew's Correlation Coefficient for upto 13.11% and 15.95%, respectively.

19.
IEEE Int Conf Healthc Inform ; 2022: 211-216, 2022 Jun.
Article in English | MEDLINE | ID: mdl-36484060

ABSTRACT

Dementia is one of the major health challenges in aging populations, with 50 million people diagnosed worldwide. However, dementia is often underdiagnosed or delayed resulting in missed opportunities for appropriate care plans. Identifying early signs of dementia is essential for better life quality of aging populations. Monitoring early signs of individual health changes could help clinicians diagnose dementia in its early stages with more effective treatment plans. However, rare data for dementia cases compared to the normal (i.e., imbalance class distribution) make it challenging to develop robust supervised learning models. In order to alleviate this issue, we investigated one-class classification (OCC) techniques, which use only majority class (i.e., normal cases) in model development to detect dementia signals from older adult clinical visits. The OCC models identify abnormality of older adults' longitudinal health conditions to predict incident dementia. The predictive performance of the OCC was compared with a recent streaming clustering-based technique and demonstrated higher predictive power. Our analysis showed that OCC has a promising potential to increase power in predicting dementia.

20.
Front Digit Health ; 4: 958539, 2022.
Article in English | MEDLINE | ID: mdl-36238199

ABSTRACT

The secondary use of electronic health records (EHRs) faces challenges in the form of varying data quality-related issues. To address that, we retrospectively assessed the quality of functional status documentation in EHRs of persons participating in Mayo Clinic Study of Aging (MCSA). We used a convergent parallel design to collect quantitative and qualitative data and independently analyzed the findings. We discovered a heterogeneous documentation process, where the care practice teams, institutions, and EHR systems all play an important role in how text data is documented and organized. Four prevalent instrument-assisted documentation (iDoc) expressions were identified based on three distinct instruments: Epic smart form, questionnaire, and occupational therapy and physical therapy templates. We found strong differences in the usage, information quality (intrinsic and contextual), and naturality of language among different type of iDoc expressions. These variations can be caused by different source instruments, information providers, practice settings, care events and institutions. In addition, iDoc expressions are context specific and thus shall not be viewed and processed uniformly. We recommend conducting data quality assessment of unstructured EHR text prior to using the information.

SELECTION OF CITATIONS
SEARCH DETAIL
...